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ABSTRACT 

Non-st ationary  signals  are  not  well  suited  for  de¬ 
tection  and  classification  by  traditional  Fourier  meth¬ 
ods.  An  alternate  means  of  analysis  needs  to  be  em¬ 
ployed  so  that  valuable  time-frequency  information  is 
not  lost.  The  wavelet  packet  transform  [1]  is  one  such 
time-frequency  analysis  tool.  This  paper  summarizes 
efforts  [2]  which  examine  the  feasibility  of  applying  the 
wavelet  packet  transform  to  automatic  transient  signal 
classification  through  the  development  of  a  classifica¬ 
tion  algorithm  for  biologically  generated  underwater 
acoustic  signals  in  ocean  noise.  The  formulation  of  a 
wavelet  packet  based  feature  set  specific  to  the  classi¬ 
fication  of  snapping  shrimp  and  whale  clicks  is  given. 


1  INTRODUCTION 

Over  the  last  decade  much  work  has  been  done  in 
applying  time-frequency  transforms  to  the  problem  of 
signal  representation  and  classification.  Mallat's  work 
on  the  application  of  wavelets  to  image  representa¬ 
tion  [3]  and  Daubechies’s  work  oil  the  development 
of  smooth  orthonormal  wavelet  basis  functions  with 
compact  support  [4]  sparked  a  great  deal  of  interest 
in  wavelets  in  the  engineering  community.  Most  re¬ 
cently,  the  emergence  of  wavelet  theory  has  motivated 
a  considerable  amount  of  research  in  transient  and 
non-stationary  signal  analysis. 

This  paper  discusses  the  use  of  the  wavelet  packet 
transform  in  the  detection  and  classification  of  tran¬ 
sient.  signals  in  background  noise.  Our  approach  fo¬ 
cuses  on  the  exploitation  of  class-specific  differences 
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obtained  through  careful  examination  of  the  feature 
separation  attainable  from  the  wavelet  packet  decom¬ 
position  of  the  transients.  The  Charles  Stark  Draper 
Laboratory  and  the  Naval  Underwater  Systems  Center 
furnished  an  extensive  collection  of  acoustic  signals  in 
background  noise  which  allowed  for  an  empirical  study 
of  some  typical  occurrences  of  snapping  shrimp  and 
whale  clicks.  A  wavelet  packet  based  feature  set  spe¬ 
cific  to  the  classification  of  snapping  shrimp  and  whale 
clicks  is  formulated. 

1.1  Motivation 

The  ability  to  classify  underwater  acoustic  signals  is 
of  great  importance  to  the  Navy.  Today,  detection  and 
classification,  tailored  for  stationary  signals,  is  done  by 
the  sonar  officer  who  listens  to  incoming  signals  and 
determines  their  origins  with  the  aid  of  a  frequency 
display  and  look-up  tables.  Transient  signals,  lasting 
only  a  fraction  of  a  second,  are  of  particular  concern 
because  they  will  typically  appear  as  broad  band  en¬ 
ergy  on  the  frequency  display,  thus,  the  sonar  officer 
must  be  able  to  detect  and  classify  these  signals  af¬ 
ter  only  listening  to  them.  These  brief  signals,  such 
as  the  single  acoustic  transmission  due  to  the  closing 
of  a  door  within  a  ship,  may  be  missed  by  the  sonar 
officer.  Success  or  failure  in  the  classification  of  tran¬ 
sient  signals  using  traditional  methods  relies  solely  on 
the  officer’s  ability  to  detect  and  classify  a  signal  after 
hearing  it  only  once.  An  automatic  method  of  classi¬ 
fication  for  transient  signals  would  greatly  aid  in  the 
detection / classification  process. 

1.2  Shortcomings  of  Fourier  Methods 

Transient  signals  are  not  well  matched  to  standard 

spectral  analysis  methods.  In  particular,  Fourier- 
based  methods  are  ideally  suited  to  the  extraction 
of  narrow  band  signals  whose  duration  exceeds  or 
is  at  least  on  the  order  of  the  Fourier  analysis  win¬ 
dow  length.  That  is,  for  sources  of  this  type  Fourier 
analysis,  particularly  the  short-term  Fourier  transform 
(STFT),  does  an  excellent  job  of  focusing  the  infor¬ 
mation,  thus,  providing  features  (spectral  amplitudes) 
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perfectly  suited  to  detection  and  discrimination.  The 
STFT  does  allow  for  some  temporal  as  well  as  fre¬ 
quency  resolution,  but  it  is  not  well  suited  for  the 
analysis  of  many  transient  signals  and,  in  particular, 
to  the  generation  of  features  for  detection  and  discrim¬ 
ination.  The  STFT  may  be  viewed  as  a  uniform  divi¬ 
sion  of  the  time-frequency  space.  It  is  calculated  for 
consecutive  segments  of  time  using  a  predetermined 
window  length.  The  accuracy  of  the  STFT  for  extract¬ 
ing  localized  time/frequency  information  is  limited  by 
the  length  of  this  window  relative  to  the  duration  of 
the  signal.  If  the  window  is  long  in  comparison  with 
the  signal  duration  there  will  be  time  averaging  of  the 
spectral  information  in  that  window.  On  the  other 
hand,  the  window  must  be  long  enough  so  that  there 
is  not  excessive  frequency  distortion  of  the  signal  spec¬ 
trum.  The  STFT  with  its  non- varying  window  is  not 
readily  adaptable  for  capturing  signal-specific  charac¬ 
teristics.  Additionally,  all  time  resolution  is  lost  within 
each  window.  We  look  to  the  wavelet  packet  trans¬ 
form  for  a  bit  more  freedom  in  dealing  with  this  time- 
frequency  trade  off. 

1.3  Current  Work  In  This  Area 

Current  work  in  the  area  of  underwater  acoustic 
transient  classification  using  wavelet  related  concepts 
has  been  done  by  Nicolas  [5]  and,  more  recently,  Desai 
and  Shazeer  [6].  They  both  employ  a  wavelet  packet 
transform  as  a  means  of  generating  class  dependent 
features  from  various  classes  of  underwater  acoustic 
transients  for  input  to  a  neural  network.  In  both  stud¬ 
ies,  exploitation  of  class  dependent  frequency  char¬ 
acteristics  are  suppressed  by  using  a  predetermined 
wavelet  packet  basis  (or  orthouormal  division  of  the 
frequency  space).  The  choice  of  the  wavelet  packet 
basis  appears  to  be  ad  hoc  in  both  cases.  By  lim¬ 
iting  the  input  to  one  signal-independent  feature  set 
the  adaptability  of  the  neural  network  was  left  un¬ 
exploited.  These  methods  also  ignore  the  redundancy 
between  parent  and  children  bins  of  the  transform  (dis¬ 
cussed  later  in  this  paper).  Additionally,  by  prohibit¬ 
ing  signal  specific  division  of  the  time-frequency  space 
there  can  be  no  exploitation  of  any  class  dependent  fre¬ 
quency  variations.  A  natural  expansion  of  their  works 
is  to  address  the  issue  of  finding  a  wavelet  packet  based 
feature  set  that  offers  maximum  feature  separability 
due  to  class-specific  characteristics. 

2  THEORY 

2.1  Wavelet  Packet  Decomposition  (WPD) 

The  wavelet  packet  decomposition  (WPD)  of  a  sig¬ 
nal  can  be  viewed  as  a  step  by  step  transformation 
of  the  signal  from  the  time  domain  to  the  frequency 


Figure  1:  The  fully  decomposed  wavelet  packet  tree 
for  a  signal  of  length  eight. 


domain.  The  top  level  of  the  WPD  tree  is  the  time  rep¬ 
resentation  of  the  signal.  As  each  level  of  the  tree  is 
traversed  there  is  an  increase  in  the  trade  off  between 
time  and  frequency  resolution.  The  bottom  level  of  a 
fully  decomposed  tree  is  a  frequency  representation  of 
the  signal. 

This  section  presents  the  ideas  developed  by  Wicker- 
hauser  [1]  extending  wavelet  concepts  to  wavelet  pack¬ 
ets.  Using  Wickerhauser’s  notation,  let  h(n)  aud  g(n) 
be  the  finite  impulse  response  low-pass  and  high-pass 
filters  derived  from  the  wavelet  chosen  for  the  de¬ 
composition.  Let  x  be  the  vector  having  elements 
xn  =  x(n ),  where  x(n)  is  the  original  discrete-time 
sequence  that  we  wish  to  decompose  via  the  wavelet 
packet  method.  Let  Fq  and  Fi  be  the  operators  which 
perform  the  convolution  with  h(n)  and  ;/(»),  respec¬ 
tively,  followed  by  a  decimation  by  two.  The  convolu¬ 
tion  and  decimation  steps  in  the  WPD  can  be  inter¬ 
preted  as  a  discrete  time  filtering  and  downsampling. 
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The  full  WPD  can  be  displayed  as  a  tree  with  a 
discrete  sequence  represented  by  a  bin  vector  at  the 
end  of  each  branch.  The  the  original  discrete  signal 
is  on  the  first  level.  The  bin  vectors  at  each  level  are 
calculated  by  applying  Fo  and  Fi  to  the  bin  vectors 
of  the  previous  level.  Figure  1  shows  a  WPD  tree  for 
a  signal  of  length  8.  Wickerliauser  uses  the  notation 
s  and  d  to  represent  the  sequences  resulting  from  the 
applications  of  Fo  aud  Fj  to  x. 


Fqx  =  s  &  Fi  x  =  d 


Figure  2:  Energy  mapping  of  the  top  five  levels  of  a 
WPD  tree.  The  eleven  numbered  bins  comprise  the 
energy  vector  used  for  analysis  in  Section  3. 


In  Figure  1,  s;  and  d;  are  the  i,h  components  of  s 
and  d. 

The  s  and  d  are  used  as  prefixes  for  the  bin  vec¬ 
tor  symbols  throughout  the  tree  because  the  low-pass 
filter-decimation  operation  can  be  compared  to  a  sum 
and  the  high-pass  filter-decimation  can  be  compared 
to  a  difference.  For  example,  at  the  third  level  of  the 
tree  ss,  ds,  sd,  and  dd  are  the  vectors  resulting  from 
the  filtering-decimation  operations 

F0  s  =  ss  k  Fi  s  =  ds 

and 

Fo  d  =  sd  &  Fj  d  =  dd. 

Due  to  the  decimation,  each  bin  vector  contains  half 
as  many  elements  as  its  parent  bin  vector.  The  decom¬ 
position  can  be  carried  down  to  the  final  level  where 
there  is  only  one  element  in  each  bin  vector. 

2.2  Energy  Mapping  of  the  WPD 

An  intuitively  pleasing  representation  of  the  WPD 
tree  is  one  that  highlights  the  energy  distribution  of 
the  signal  as  it  is  decomposed  down  the  tree.  Such 
an  energy  map  calculates  an  energy,  ey,  from  each  bin 
vector,  y.  A  simple  energy  calculation  is  the  total 
energy  in  each  bin 

1  2* 

€y  =  W  X-yj'  ^ 

i- 1 

where  2N  is  the  number  of  elements  in  y. 

There  is,  however,  freedom  in  the  choice  of  the  cal¬ 
culation  of  e,j.  For  example,  due  to  the  short  duration 
of  a  shrimp  snap  relative  to  a  whale  click,  it  may  be 
beneficial  to  calculate  a  windowed  energy  in  each  bin 
by  calculating  the  energy  in  adjacent  or  overlapping 


Figure  3:  A  typical  whale  click  and  snapping  shrimp. 

segments  of  the  bin  vector,  choosing  a  segment  length 
small  enough  to  encompass  one  snap  at  a  time.  Fig¬ 
ure  2  shows  the  energy  mapping  the  top  five  levels  of 
a  WPD  tree. 

3  CHOOSING  THE  FEATURE  SET 

In  the  formulation  of  a  decision  rule,  it  is  desirable  to 
find  a  feature  set  which  captures  characteristics  unique 
to  each  class  of  signals.  Typically,  the  feature  set  uses 
a  greatly  reduced  number  of  parameters  in  comparison 
with  the  number  of  samples  in  the  signal. 

Our  feature  set  was  found  via  an  empirical  study 
of  the  data  using  54  excerpts  from  the  NUSC  data 
records.  The  duration  of  each  excerpt  is  4096  sam¬ 
ples  or  163.8  milliseconds.  A  typical  whale  click  will 
have  a  duration  of  approximately  80  to  120  millisec¬ 
onds  and  a  single  snap  of  a  shrimp  will  have  a  duration 
on  the  order  of  1  millisecond.  The  4096  sample  window 
will  from  one  snap  to  an  uncountably  large  number  of 
snaps  and  can  entirely  encompass  one  whale  click.  The 
sample  signal  data  base  comprises  18  isolated  whale 
clicks.  18  background  noise  excerpts,  and  18  snapping 
shrimp  excerpts.  Figure  3  shows  the  time  plots  of  a 
typical  whale  click  and  some  snapping  shrimp. 

The  transformation  of  the  WPD  trees  into  energy 
maps  using  (1)  showed  promising  clarification  of  in¬ 
formation.  We  began  analysis  of  the  54  sample  energy 
maps  using  the  eleven  bin  energies  corresponding  to 
numbered  bins  shown  in  Figure  2.  The  choice  of  these 
eleven  bins  is  discussed  in  greater  detail  in  [2]  .  Let 
e*  be  the  energy  vector  containing  these  eleven  bin 
energies,  where  k  =  1,...,18  for  each  t  =  shrimp,  click, 
or  noise. 


Clicks  =  +  Shrimps* 


Figure  4:  Normalized  energies  from  bins  3  and  4  of  the 
sample  energy  maps  for  snapping  shrimp  and  whale 
clicks. 

Each  bin  energy  contains  both  signal  and  noise  en¬ 
ergies.  Before  continuing  the  search  for  a  reduced  pa¬ 
rameter  feature  vector,  the  influence  of  noise  on  these 
eleven  bin  energies  was  compensated  for  by  normaliz¬ 
ing  each  of  the  eleven  bin  energies  by  the  correspond¬ 
ing  average  noise  energy  as  discussed  in  [2].  Each  of 
the  54  energy  vectors,  ef,  corresponds  to  the  eleven 
bin  energies  from  each  of  the  54  sample  energy  maps. 
From  these  we  find  54  normalized  energy  vectors,  e< , 
containing  the  corresponding  normalized  bin  energies. 

A  quantitative  analysis  was  done  by  grouping  the 
54  normalized  energy  vectors  into  three  classes  and  ar¬ 
ranging  them  into  three  matrices,  Et ,  having  columns 
g}  through  efls  for  each  class,  t  =  click,  shrimp,  and 
noise.  Singular  value  decomposition  of  each  Et  reveals 
one  significant  singular  value,  crt ,  and  singular  vector, 
u«  for  each  class,  t.  All  other  singular  values  were 
negligible.  From  examination  of  these  three  singular 
vectors  we  found  that  the  bins  numbered  3, 4, 7, 8  and 
9  in  Figure  2  contain  the  dominant  information. 

Reduction  of  the  feature  vector  is  desirable  for  the 
simplification  of  the  decision  rule,  and  superfluous  in¬ 
formation  should  be  avoided.  A  feature  set  which  con- 
tains  a  parent  bin  energy  and  all  of  its  descendant  bin 
energies  may  be  redundant  because  any  parent  bin  vec¬ 
tor  of  the  WPD  tree  can  be  constructed  from  a  linear 
combination  of  its  children  bin  vectors.  The  feature 
set  need  not  include  all  of  the  energies  in  bins  of  the 
energy  map  that  are  related  in  this  way.  This  will  also 
minimize  the  computational  complexity  of  the  energy- 
vector  because  many  bins  of  the  WPD  tree  will  not  be 
used  and  will,  therefore,  not  be  calculated.  Bins  3  and 
4  are  parents  to  7,  8  and  9,  thus,  we  begin  with  only 
the  3rd  and  4th  bin  energies.  Figure  4  plots  the  nor¬ 


malized  energies  from  bins  3  and  4  of  the  54  sample 
energy  maps.  There  is  excellent  separation  between 
the  click  and  shrimp  features. 

4  CONCLUSION  AND  FUTURE  WORK 

This  paper  has  presented  results  for  the  case  of  snap¬ 
ping  shrimp  and  whale  clicks;  we  are  able  to  find  a 
wavelet  packet  based  feature  set  containing  only  two 
parameters  which  offers  excellent  separation  of  class 
specific  characteristics.  These  features  will  greatly 
simplify  the  classification  process  for  these  two  classes 
of  signals. 

Forthcoming,  we  are  formulating  a  number  of  meth¬ 
ods  for  detection  and  classification  using  neural  net¬ 
works  and  various  pattern  recognition  techniques  that 
lend  themselves  to  the  classification  of  signals  using 
features  of  a  limited  number  of  sample  signals  as  a 
training  set.  We  are  examining  the  robustness  of  the 
detection  and  classification  algorithms  derived  from 
this  reduced  parameter  feature  set  by  running  them 
on  the  entire  NUSC  data  base  which  includes  under¬ 
water  sounds  generated  by  popping  ice,  porpoise  whis¬ 
tles,  and  whale  cries  in  addition  to  many  occurrences 
of  snapping  shrimp  and  whale  clicks.  A  detailed  dis¬ 
cussion  of  the  derivation  and  performance  of  different 
algorithms  used  with  the  wavelet  packet  based  feature 
set  is  given  in  [2], 
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